Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Free, publicly-accessible full text available February 1, 2026
- 
            Large language models (LLMs) are notoriously memory-intensive during training, particularly with the popular AdamW optimizer. This memory burden necessitates using more or higher-end GPUs or reducing batch sizes, limiting training scalability and throughput. To address this, various memory-efficient optimizers have been proposed to reduce optimizer memory usage. However, they face critical challenges: (i) reliance on costly SVD operations; (ii) significant performance trade-offs compared to AdamW; and (iii) still substantial optimizer memory overhead to maintain competitive performance. In this work, we identify that AdamW's learning rate adaptation rule can be effectively coarsened as a structured learning rate update. Based on this insight, we propose Approximated Gradient Scaling for Memory-Efficient LLM Optimization (APOLLO), which approximates learning rate scaling using an auxiliary low-rank optimizer state based on pure random projection. This structured learning rate update rule makes APOLLO highly tolerant to further memory reductions while delivering comparable pre-training performance. Even its rank-1 variant, APOLLO-Mini, achieves superior pre-training performance compared to AdamW with SGD-level memory costs. Extensive experiments demonstrate that the APOLLO series performs on-par with or better than AdamW, while achieving greater memory savings by nearly eliminating the optimization states of AdamW. These savings provide significant system-level benefits: (1) Enhanced Throughput: 3x throughput on an 8xA100-80GB setup compared to AdamW by supporting 4x larger batch sizes. (2) Improved Model Scalability: Pre-training LLaMA-13B with naive DDP on A100-80GB GPUs without system-level optimizations. (3) Low-End GPU Friendly Pre-training: Pre-training LLaMA-7B on a single GPU using less than 12 GB of memory with weight quantization.more » « lessFree, publicly-accessible full text available February 17, 2026
- 
            How do children’s visual concepts change across childhood, and how might these changes be reflected in their drawings? Here we investigate developmental changes in children’s ability to emphasize the relevant visual distinctions between object categories in their drawings. We collected over 13K drawings from children aged 2-10 years via a free-standing drawing station in a children’s museum. We hypothesized that older children would produce more recognizable drawings, and that this gain in recognizability would not be entirely explained by concurrent development in visuomotor control. To measure recognizability, we applied a pretrained deep convolutional neural network model to extract a high-level feature representation of all drawings, and then trained a multi-way linear classifier on these features. To measure visuomotor control, we developed an automated procedure to measure their ability to accurately trace complex shapes. We found consistent gains in the recognizability of drawings across ages that were not fully explained by children’s ability to accurately trace complex shapes. Furthermore, these gains were accompanied by an increase in how distinct different object categories were in feature space. Overall, these results demonstrate that children’s drawings include more distinctive visual features as they grow older.more » « less
- 
            How do children’s representations of object categories change as they grow older? As they learn about the world around them, they also express what they know in the drawings they make. Here, we examine drawings as a window into how children represent familiar object categories, and how this changes across childhood. We asked children (age 3-10 years) to draw familiar object categories on an iPad. First, we analyzed their semantic content, finding large and consistent gains in how well children could produce drawings that are recognizable to adults. Second, we quantified their perceptual similarity to adult drawings using a pre-trained deep convolutional neural network, allowing us to visualize the representational layout of object categories across age groups using a common feature basis. We found that the organization of object categories in older children’s drawings were more similar to that of adults than younger children’s drawings. This correspondence was strong in the final layers of the neural network, showing that older children’s drawings tend to capture the perceptual features critical for adult recognition. We hypothesize that this improvement reflects increasing convergence between children’s representations of object categories and that of adults; future work will examine how these age-related changes relate to children’s developing perceptual and motor capacities. Broadly, these findings point to drawing as a rich source of insight into how children represent object concepts.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available